home *** CD-ROM | disk | FTP | other *** search
-
- Here's a technical description of the High Performance Files System
- that was posted on the net recently.
-
- ***** Computer Library Periodicals, Jan 1990 : Doc #14753 *****
-
- Journal: Microsoft Systems Journal Sept 1989 v4 n5 p1(13)
- * Full Text COPYRIGHT Microsoft Corp. 1989.
- -----------------------------------------------------------------------------
- Title: Design goals and implementation of the new High Performance File
- System. (includes related article on B-Trees and B+ Trees)
- Author: Duncan, Roy.
-
- Summary: The High Performance File System (HPFS) enhancement to OS/2
- Version 1.2 solves all the problems of the File Allocation Table (FAT) file
- system and is designed to meet the demands expected into the next few
- decades. HPFS not only serves as a way to organize data on random access
- block storage devices, but is also a software module that translates
- file-oriented requests from applications programs to device drivers. HPFS is
- also an example of an installable file system, which makes it possible to
- access several incompatible volume structures on the same OS/2 system
- simultaneously. Excellent throughput is achieved by the use of advanced data
- structures such as intelligent caching, read-ahead and write-behind. Disk
- space is managed more economically by the use of sectoring. HPFS also
- includes greatly improved fault tolerance. Applications programs need only
- simple modifications to make use of extended attributes and long filenames.
- -----------------------------------------------------------------------------
- Descriptors..
- Product: OS-2 Extended Edition 1.2 (Operating system) (product
- enhancement).
- Topic: OS-2
- File Management
- Enhancements
- Data Structures
- Disk Space Allocation
- Sectoring.
- Feature: illustration
- table
- chart.
- Caption: (Comparison of FAT and High Performance File System.)
- (Overall structure of an HPFS volume.)
- (Overall structure of an Fnode.)
-
- Record#: 07 585 454.
- -----------------------------------------------------------------------------
- Full Text:
-
- THE HPFS IS A WAY OF ORGANIZING DATA ON A RANDOM ACCESS BLOCK STORAGE DEVICE.
- IT IS ALSO A SOFTWARE MODULE THAT TRANSLATES FILE-ORIENTED REQUESTS FROM AN
- APPLICATION PROGRAM INTO MORE PRIMITIVE REQUESTS THAT A DEVICE DRIVER CAN
- UNDERSTAND.
-
- The High Performance File System (hereafter HPFS), which is making its first
- appearance in the OS/2 operating systemVersion 1.2, had its genesis in the
- network division of Microsoft and was designed by Gordon Letwin, the chief
- architect of the OS/2 operating system. The HPFS has been designed to meet
- the demands of increasingly powerful PCs, fixed disks, and networks for many
- years to come and to serve as a suitable platform for object-oriented
- languages, applications, and user interfaces.
-
- The HPFS is a complex topic because it incorporates three distinct yet
- interrelated file system issues. First, the HPFS is a way of organizing data
- on a random access block storage device. Second, it is a software module that
- translates file-oriented requests from an application program into more
- primitive requests that a device driver can understand, using a variety of
- creative techniques to maximize performance. Third, the HPFS is a practical
- illustration of an important new OS/2 feature known as installable file
- systems.
-
- This article introduces the three aspects of the HPFS. But first, it puts
- the HPFS in perspective by reviewing some of the problems that led to the
- system's existence.
-
- FAT File System
-
- The so-called FAT file system, which is the file system used in all versions
- of the MS-DOS"' operating system to date and in the first two releases of
- OS/2' (Versions 1.0 and 1.1), has a dual heritage in Microsoft's earliest
- programming language products and the Digital Research(R) CP/M(R) operating
- system-software originally written for 8080-based and Z-80-based
- microcomputers. It inherited characteristics from both ancestors that have
- progressively tumed into handicaps in this new era of multitasking, protected
- mode, virtual memory, and huge fixed disks.
-
- The FAT file system revolves around the File Allocation Table for which it is
- named. Each logical volume has its own FAT, which serves two important
- functions: it contains the allocation information for each file on the volume
- in the form of linked lists of allocation units (clusters, which are
- power-of-2 multiples of sectors) and it indicates which allocation units are
- free for assignment to a file that is being created or extended.
-
- The FAT was invented by Bill Gates and Marc McDonald in1977 as a method of
- managing disk space in the NCR version of standalone Microsoft* Disk BASIC.
- Tim Paterson, at that time an employee of Seattle (R) Computer Products
- (SCP), was introduced to the FAT concept when his company shared a booth with
- Microsoft at the National Computer Conference in 1979. Paterson subsequently
- incorporated FATs into the file system of 86-DOS, an operating system for
- SCP's S-100 bus 8086 CPU boards. 86-DOS was eventually purchased by
- Microsoft and became the starting point for MS-DOS2 Version 1.0, which was
- released for the original IBM" PC in August 1981.
-
- When the FAT was conceived, it was an excellent solution to disk management,
- mainly because the floppy disks on which it was used were rarely larger than
- I Mb. On such di s, the FAT was small enough to be held in memory at all
- times, allowing very fast random access to any part of any file. This proved
- far superior to the CP/M method of tracking disk space, in which the
- information about the sectors assigned to a file might be spread across many
- directory entries, which were in turn scattered randomly throughout the disk
- directory.
-
- When applied to fixed disks, however, the FAT began to look more like a bug
- than a feature. It became too large to be held entirely resident and had to
- be paged into memory in pieces; this paging resulted in many superfluous disk
- head movements as a program was reading through a file and degraded system
- throughput. In addition, because the information about free disk space was
- dispersed across many sectors of FAT, it was impractical to allocate file
- space contiguously, and file fragmentation became another obstacle to good
- performance. Moreover, the use of relatively large clusters on fixed disks
- resulted in a lot of dead space, since an average of one-half cluster was
- wasted for each file. (Some network servers use clusters as large as 64Kb.)
-
- The FAT file system's restrictions on naming files and directories are
- inherited from CP/M. When Paterson was writing 86DOS, one of his primary
- objectives was to make programs easy to port from CP/M to his new operating
- system. He therefore adopted CP/M's limits on filenames and extensions so
- the critical fields of 86-DOS File Control Blocks (FCBs) would look almost
- exactly like those of CP/M. The sizes of the FCB filename and extension
- fields were also propagated into the structure of disk directory entries. In
- due time, 86-DOS became MSDOS and application programs for MS-DOS
- proliferated beyond anyone's wildest dreams. Since most of the early
- programs depended on the structure of FCBs, the 8.3 format for filenames
- became irrevocably locked into the system.
-
- During the last couple of years, Microsoft and IBM have made valiant attempts
- to prolong the useful life of the FAT file system by lifting the restrictions
- on volume sizes, improving allocation strategies, caching pathnames, and
- moving tables and buffers into expanded memory. But these can only be
- regarded as temporizing measures, because the fundamental data structures
- used by the FAT file system are simply not well suited to large random access
- devices.
-
- The HPFS solves the FAT file system problems mentioned here and many others,
- but it is not derived in any way from the FAT file system. The architect of
- the HPFS started with a clean sheet of paper and designed a file system that
- can take full advantage of a multitasking environment, and that will be able
- to cope with any sort of disk device likely to arrive on microcomputers
- during the next decade.
-
- HPFS Volume Structure
-
- HPFS volumes are a new partition type-type 7-and can exist on a fixed disk
- alongside of the several previously defined FAT partition types.
- IBM-compatible HPFS volumes use a sector size of 512 bytes and have a maximum
- size of 2199Gb (2" sectors). Although there is no particular reason why
- floppy disks can't be formatted as HPFS volumes, Microsoft plans to stick
- with FAT file systems on floppy disks for the foreseeable future. (This
- ensures that users will be able to transport files easily between MS-DOS and
- OS/2 systems.)
-
- An HPFS volume has very few fixed structures (Figure 1). Sectors 0-15 of a
- volume (8Kb) are the BootBlock and contain a volume name, 32-bit volume ID,
- and a disk bootstrap program. The bootstrap is relatively sophisticated (by
- MS-DOS standards) and can use the HPFS in a restricted mode to locate and
- read the operating system files wherever they might be found.
-
- Sectors 16 and 17 are known as the SuperBlock and the SpareBlock
- respectively. The SuperBlock is only modified by disk maintenance utilities.
- It contains pointers to the free space bitmaps, the bad block list, the
- directory block band, and the root directory. It also contains the date that
- the volume was last checked out and repaired with CHKDSK/F. The SpareBlock
- contains various flags and pointers that will be discussed later; it is
- modified, although infrequently, as the system executes.
-
- The remainder of the disk is divided into 8Mb bands. Each band has its own
- free space bitmap in which a bit represents each sector. A bit is 0 if the
- sector is in use and I if the sector is available. The bitmaps are located
- at the head or tail of a band so that two bitmaps are adjacent between
- alternate bands. This allows the maximum contiguous free space that can be
- allocated to a file to be 16Mb. One band, located at or toward the seek
- center of the disk, is called the directory block band and receives special
- treatment (more about this later). Note that the band size is a
- characteristic of the current implementation and may be changed in later
- versions of the file system.
-
- Files and Fnodes
-
- Every file or directory on an HPFS volume is anchored on a fundamental file
- system object called an Fnode (pronounced "eff node"). Each Fnode occupies a
- single sector and contains control and access history information used
- internally by the file system, extended attributes and access control lists
- (more about this later), the length and the first 15 characters of the name
- of the associated file or directory, and an allocation structure (Figure 2).
- An Fnode is always stored near the file or directory that it represents.
-
- The allocation structure in the Fnode can take several forms, depending on
- the size and degree of contiguity of the file or directory. The HPFS views a
- file as a collection of one or more runs or extents of one or more contiguous
- sectors. Each run is symbolized by a pair of doublewords-a 32-bit starting
- sector number and a 32-bit length in sectors (this is referred to as
- runlength encoding). From an application program's point of view, the
- extents are invisible; the file appears as a seamless stream of bytes.
-
- The space reserved for allocation information in an Fnode can hold pointers
- to as many as eight runs of sectors of up to 16Mb each. (This maximum run
- size is a result of the band size and free space bitmap placement only; it is
- not an inherent limitation of the file system.) Reasonably small files or
- highly contiguous files can therefore be described completely within the
- Fnode (Figure 3).
-
- HPFS uses a new method to represent the location of files that are too large
- or too fragmented for the Fnode and consist of more than eight runs. The
- Fnode's allocation structure becomes the root for a B+ Tree of allocation
- sectors, which in tum contain the actual pointers to the file's sector runs
- (see Figure 4 and the sidebar, "BTrees and B+ Trees"). The Fnode's root has
- room for 12 elements. Each allocation sector can contain, in addition to
- various control information, as many as 40 pointers to sector runs.
- Therefore, a two-level allocation B+ Tree can describe a file of 480 (12*40)
- sector runs with a theoretical maximum size of 7.68Gb (12*40*16Mb) in the
- current implementation (although the 32-bit signed offset parameter for
- DosChgFilePtr effectively limits file sizes to 2Gb).
-
- In the unlikely event that a two-level B+ Tree is not sufficient to describe
- a highly fragmented file, the file system will introduce additional levels in
- the tree as needed. Allocation sectors in the intermediate levels can hold
- as many as 60 internal (nonterminal) B+ Tree nodes, which means that the
- descriptive ability of this structure rapidly grows to numbers that are
- nearly beyond comprehension. For example, a threelevel allocation B+ Tree
- can describe a file with as many as 28,800 (12*60*40) sector runs.
-
- Run-length encoding and B+ Trees of allocation sectors are a memory-efficient
- way to specify a file's size and location, but they have other significant
- advantages. Translating a logical file offset into a sector number is
- extremely fast: the file system just needs to traverse the list (or B+ Tree
- of lists) of run pointers until it finds the correct range. It can then
- identify the sector within the run with a simple calculation. Run-length
- encoding also makes it trivial to extend the file logically if the newly
- assigned sector is contiguous with the file's previous last sector; the file
- system merely needs to increment the size doubleword of the file's last run
- pointer and clear the sector's bit in the appropriate freespace bitmap.
-
- Directories
-
- Directories, like files, are anchored on Fnodes. A pointe to the Fnode for
- the root directory is found in the SuperBlock. The Fnodes for directories
- other than the root are reache through subdirectory entries in their parent
- directories.
-
- Directories can grow to any size and are built up from 2Kb directory blocks,
- which are allocated as four consecutive sectors on the disk. The file system
- attempts to allocate directory blocks in the directory band, which is located
- at or near the seek center of the disk. Once the directory band is full, the
- directory blocks are allocated wherever space is available.
-
- Each 2Kb directory block contains from one to many directory entries. A
- directory entry contains several fields, including time and date stamps, an
- Fnode pointer, a usage count for use by disk maintenance programs, the length
- of the file or directory name, the name itself, and a B-Tree pointer. Each
- entry begins with a word that contains the length of the entry. This
- provides for a variable amount of flex space at the end of each entry, which
- can be used by special versions of the file system and allows the directory
- block to be traversed very quickly (Figure 5).
-
- The nuniber of entries in a directory block varies with the length of names.
- If the average filename length is 1 3 characters, an average directory block
- will hold about 40 entries. The entries in a directory block are sorted by
- the binary lexical order of their name fields (this happens to put them in
- alphabetical order for the U.S. alphabet). The last entry in a directory
- block is a dummy record that marks the end of the block.
-
- When a directory gets too large to be stored in one block, it increases in
- size by the addition of 2Kb blocks that are organized as a B-Tree"B-Trees and
- B+ Trees"). When searching for a specific name, the file system traverses a
- directory block until it either finds a match or finds a name that is
- lexically greater than the target. In the latter case, the file system
- extracts the BTree pointer from the entry. If there is no pointer, the
- search failed; otherwise the file system follows the pointer to the next
- directory block in the tree and continues the search.
-
- A little back-of-the-envelope arithmetic yields some impressive statistics.
- Assuming 40 entries per block, a two-level tree of directory blocks can hold
- 1640 directory entries and a three-level tree can hold an astonishing 65,640
- entries. In other words, a particular file can be found (or shown not to
- exist) in a typical directory of 65,640 files with a maximum of three disk
- hits-the actual number of disk accesses depending on cache contents and the
- location of the file's name in the directory block B-Tree. That's quite a
- contrast to the FAT file system, where in the worst case more than 4000
- sectors would have to be read to establish that a file was or was not present
- in a directory containing the same number of files.
-
- The B-Tree directory structure has interesting implications beyond its effect
- on open and find operations. A file creation, renaming, or deletion may
- result in a cascade of complex operations, as directory blocks are added or
- freed or names are moved from one block to the other to keep the tree
- balanced. In fact, a rename operation could theoretically fail for lack of
- disk space even though the file itself is not growing. In order to avoid
- this sort of disaster, the HPFS maintains a small pool of free blocks that
- can be drawn from in a directory emergency; a pointer to this pool of free
- blocks is stored in the SpareBlock.
-
- Extended Attributes
-
- File attributes are information about a file that is maintained by the
- operating system outside the file's overt storage area. The FAT file system
- supports only a few simple attributes (read only, system, hidden, and
- archive) that are actually stored as bit flags in the file's directory entry;
- these attributes are inspected or modified by special function calls and are
- not accessible through the normal file open, read, and write calls.
-
- The HPFS supports the same attributes as the FAT file system for historical
- reasons, but it also supports a new form of fileassociated, highly
- generalized information called Extended Attributes (EAs). Each EA is
- conceptually similar to an environment variable, taking the form
-
- name- -value
-
- except that the value portion can be either a null-terminated (ASCIIZ) string
- or binary data. In OS/2 1.2, each file or directory can have a maximum of
- 64Kb of EAs attached to it. This limit may be lifted in a later release of
- OS/2.
-
- The storage method for EAs can vary. If the EAs associated with a given file
- or directory are small enough, they will be stored right in the Fnode. If
- the total size of the EAs is too large, they are stored outside the Fnode in
- sector runs, and a B+ Tree of allocation sectors can be created to describe
- the runs. If a single EA gets too large, it can be pushed outside the Fnode
- into a B+ Tree of its own.
-
- The kernel API functions DosQFileInfo and DosSetFileInfo have been expanded
- with new information levels that allow application programs to manipulate
- extended attributes for files. The new functions DosQPathInfo and
- DosSetPathInfo are used to read or write the EAs associated with arbitrary
- pathnames. An application program can either ask for the value of a specific
- EA (supplying a name to be matched) or can obtain all of the EAs for the file
- or directory at once.
-
- Although application programs can begin to take advantage of EAs as soon as
- the HPFS is released, support for EAs is an essential component in
- Microsoft's long-range plans for object-oriented file systems, Information of
- almost any type can be stored in EAs, ranging from the name of the
- application that owns the file to names of dependent files to icons to
- executable code. As the HPFS evolves, its facilities for manipulating EAs
- are likely to become much more sophisticated. It's easy to imagine, for
- example, that in future versions the API might be extended with EA functions
- that are analogous to DosFindFirst and DosFindNext and EA data might get
- organized into B -Trees.
-
- I should note here that in addition to EAs, the LAN Manager version of HPFS
- will support another class of file-associated information called Access
- Control Lists (ACLs). ACLs have the same general appearance as EAs and are
- manipulated in a similar manner, but they are used to store access rights,
- passwords, and other information of interest in a networking multiuser
- environment.
-
- Installable File Systems
-
- Support for installable file system has been one of the most eagerly
- anticipated features of OS/2 Version 1.2. It will make it possible to access
- multiple incompatible volume structures-FAT, HPFS, CD ROM, and perhaps even
- UNIX(R)--on the same OS/2 system at the same time, will simplify the life of
- network implementors, and will open the door to rapid file system evolution
- and innovation. Installable file systems are, however, only relevant to the
- HPFS insofar as they make use of the HPFS optional. The FAT file system is
- still embedded in the OS/2 kernel, as it was in OS/2 1.0 and 1.1, and will
- remain there as the compatibility file system for some time to come.
-
- An installable file system driver (FSD) is analogous in many ways to a device
- driver. An FSD resides on the disk in a file that is structured like a
- dynamic-link library (DLL), typically with a SYS or IFS extension, and is
- loaded during system initialization by IFS= statements in the CONFIG.SYS
- file. IFS= directives are processed in the order they are encountered and
- are also sensitive to the order of DEVICE= statements for device drivers.
- This lets you load a device driver for a nonstandard device, load a file
- system driver from a volume on that device, and so on.
-
- Once an FSD is installed and initialized, the kernel communicates with it in
- terms of logical requests for file opens, reads, writes, seeks, closes, and
- so on. The FSD translates these requests-using control structures and tables
- found on the volume itself-into requests for sector reads and writes for
- which it can call special kemel entry points called File System Helpers
- (FsHlps). The kemel passes the demands for sector 1/0 to the appropriate
- device driver and retums the results to the FSD (Figure 6).
-
- The procedure used by the operating system to associate volumes with FSDs is
- called dynamic mounting and works as follows. Whenever a volume is first
- accessed, or after it has been locked for direct access and then unlocked
- (for example, by a FORMAT operation), OS/2 presents identifying information
- from the volume to each of the FSDs in tum until one of them recognizes the
- information. When an FSD claims the volume, the volume is mounted and all
- subsequent file 1/0 requests for the volume are routed to that FSD.
-
- Performance Issues
-
- The HPFS attacks potential bottlenecks in disk throughput at multiple levels.
- It uses advanced data structures, contiguous sector allocation, intelligent
- caching, read-ahead, and deferred writes in order to boost performance.
-
- First, the HPFS matches its data structures to the task at hand:
- sophisticated data structures (B-Trees and B+ Trees) for fast random access
- to filenames, directory names, and lists of sectors allocated to files or
- directories, and simple compact data structures (bitmaps) for locating chunks
- of free space of the appropriate size. The routines that manipulate these
- data structures are written in assembly language and have been painstakingly
- tuned, with special focus on the routines that search the freespace bitmaps
- for patterns of set bits (unused sectors).
-
- Next, the HPFS's main goal -its prime directive, if you will- is to assign
- consecutive sectors to files whenever possible. The time required to move
- the disk's read/write head from one track to another far outweighs the other
- possible delays, so the HPFS works hard to avoid or minimize such head
- movements by allocating file space contiguously and by keeping control
- structures such as Fnodes and freespace bitmaps near the things they control.
- Highly contiguous files also help the file system make fewer requests of the
- disk driver for more sectors at a time, allow the disk driver to exploit the
- multisector transfer capabilities of the disk controller, and reduce the
- number of disk completion interrupts that must be serviced.
-
- Of course, trying to keep files from becoming fragmented in a multitasking
- system in which many files are being updated concurrently is no easy chore.
- One strategy the HPFS uses is to scatter newly created files across the
- disk-in separate bands, if possible-so that the sectors allocated to the
- files as they are extended will not be interleaved. Another strategy is to
- preallocate approximately 4Kb of contiguous space to the file each time it
- must be extended and give back any excess when the file is closed.
-
- If an application knows the ultimate size of a new file in advance, it can
- assist the file system by specifying an initial file allocation when it
- creates the file. The system will then search all the free space bitmaps to
- find a run of consecutive sec tors large enough to hold the file. That
- failing, it will search for two runs that are half the size of the file, and
- so on.
-
- The HPFS relies on several different kinds of caching to minimize the number
- of physical disk transfers it must request. Naturally, it caches sectors, as
- did the FAT file system. But unlike the FAT file system, the HPFS can manage
- very large caches efficiently and adjusts sector caching on a perhandle basis
- to the manner in which a file is used. The HPFS also caches pathnames and
- directories, transforming disk directory entries into an even more compact
- and efficient inmemory representation.
-
- Another technique that the HPFS uses to improve performance is to preread
- data it believes the program is likely to need. For example, when a file is
- opened, the file system will preread and cache the Fnode and the first few
- sectors of the file's contents. If the file is an executable program or the
- history information in the file's Fnode shows that an open operation has
- typically been followed by an immediate sequential read of the entire file,
- the file system will preread and cache much more of the file's contents.
- When a program issues relatively small read requests, the file system always
- fetches data from the file in 2Kb chunks and caches the excess, allowing most
- read operations to be satisfied from the cache.
-
- Finally, the OS/2 operating system's support for multitasking makes it
- possible for the HPFS to rely heavity on lazy writes (sometimes called
- deferred writes or write behind) to improve performance. When a program
- requests a disk write, the data is placed in the cache and the cache buffer
- is flagged as dirty (that is, inconsistent with the state of the data on
- disk). When the disk becomes idle or the cache becomes saturated with dirty
- buffers, the file system uses a captive thread from a daemon process to write
- the buffers to disk, starting with the oldest data.
-
- In general, lazy writes mean that programs run faster because their read
- requests will almost never be stalled waiting for a write request to
- complete. For programs that repeatedly read, modify, and write a small
- working set of records, it also means that many unnecessary or redundant
- physical disk writes may be avoided. Lazy writes have their dangers, of
- course, so a program can defeat them on a per-handle basis by setting the
- writethrough flag in the OpenMode parameter for DosOpen, or it can commit
- data to disk on a perhandle basis with the DosBufReset function.
-
- Fault Tolerance
-
- The HPFS's extensive use of lazy writes makes it imperative for the HPFS to
- be able to recover gracefully from write errors under any but the most dire
- circumstances. After all, by the time a write is known to have failed, the
- application has long since gone on its way under the illusion that it has
- safely shipped the data into disk storage. The errors may be detected by the
- hardware (such as a "sector not found" error returned by the disk adapter),
- or they may be detected by the disk driver in spite of the hardware during a
- read-after-write verification of the data.
-
- The primary mechanism for handling write errors is called a hotfix. When an
- error is detected, the file system takes a free block out of a reserved
- hotfix pool, writes the data to that block, and updates the hotfix map. (The
- hotfix map is simply a series of pairs of doublewords, with each pair
- containing the number of a bad sector associated with the number of its
- hotfix replacement. A pointer to the hotfix map is maintained in the
- SpareBlock.) A copy of the hotfix map is t written to disk, and a waming
- message is displayed to let the user know that all is not well with the disk
- device.
-
- Each time the file system requests a sector read or write from the disk
- driver, it scans the hotfix map and replaces any bad sector numbers with the
- corresponding good sector holding the actual data. This lookaside translation
- of sector numbers is not as expensive as it sounds, since the hotfix list
- need only be scanned when a sector is physically read or written, not each
- time it is accessed in the cache.
-
- One of CHKDSK's duties is to empty the hotfix map. For each replacement
- block on the hotfix map, it allocates a new sector that is in a favorable
- location for the file that owns the data, moves the data from the hotfix
- block to the newly allocated sector, and updates the file's allocation
- information (which may involve rebalancing allocation trees and other
- elaborate operations). It then adds the bad sector to the bad block list,
- releases the replacement sector back to the hotfix pool, deletes the hotfix
- entry from the hotfix map, and writes the updated hotfix map to disk.
-
- Of course, write errors that can be detected and fixed on the fly are not the
- only calamity that can befall a file system. The HPFS designers also had to
- consider the inevitable damage to be wreaked by power failures, program
- crashes, malicious viruses and Trojan horses, and those users who tum off the
- machine without selecting Shutdown in the Presentation Manager Shell.
- (Shutdown notifies the file system to flush the disk cache, update
- directories, and do whatever else is necessary to bring the disk to a
- consistent state.)
-
- The HPFS defends itself against the user who is too abrupt with the Big Red
- Switch by maintaining a Dirty FS flag in the SpareBlock of each HPFS volume.
- The flag is only cleared when all files on the volume have been closed and
- all dirty buffers in the cache have been written out or, in the case of the
- boot volume (since OS2.INI and the swap file are never closed), when Shutdown
- has been selected and has completed its work.
-
- During the OS/2 boot sequence, the file system inspects the DirtyFS flag on
- each HPFS volume and, if the flag is set, will not allow further access to
- that volume until CHKDSK has been run. If the DirtyFS flag is set on the boot
- volume, the system will refuse to boot; the user must boot OS/2 in
- maintenance mode from a diskette and run CHKDSK to check and possibly repair
- the boot volume.
-
- In the event of a truly major catastrophe, such as loss of the SuperBlock or
- the root directory, the HPFS is designed to give data recovery the best
- possible chance of success. Every type of crucial file object-including
- Fnodes, allocation sectors, and directory blocks-is doubly linked to both its
- parent and its children and contains a unique 32-bit signature. Fnodes also
- contain the initial portion of the name of their file or directory.
- Consequently, CHKDSK can rebuild an entire volume by methodically scanning
- the disk for Fnodes, allocation sectors, and directory blocks, using them to
- reconstruct the files and directories and finally regenerating the freespace
- bitmaps.
-
- Application Programs and the HPFS
-
- Each of the OS/2 releases thus far have carried with them a major
- discontinuity for application programmers who learned their trade in the
- MS-DOS environment. In OS/2 1.0, such programmers were faced for the first
- time with virtual memory, multitasking, interprocess communications, and the
- protected mode restrictions on addressing and direct control of the hardware
- and were challenged to master powerful new concepts such as threading and
- dynamic linking. In OS/2 Version 1.1, the stakes were raised even further.
- Programmers were offered a powerful hardware-independent graphical user
- interface but had to restructure their applications drastically for an
- event-driven environment based on objects and message passing.
-
- In OS/2 Version 1.2, it is time for many of the file-oriented programming
- habits and assumptions carried forward from the MS-DOS environment to fall by
- the wayside. An application that wishes to take full advantage of the HPFS
- must allow for long, free-form, mixed-case filenames and paths with few
- restrictions on punctuation and must be sensitive to the presence of EAs and
- ACLs. After all, if EAs are to be of any use, it won't suffice for
- applications to update a file by renaming the old file and creating a new one
- without also copying the EAs.
-
- But the necessary changes for OS/2 Version 1.2 are not tricky to make. A new
- API function, DosCopy, helps applications create backups-it essentially
- duplicates an existing file together with its EAs. EAs can also be
- manipulated explicitly with DosQFileInfo, DosSetFileInfo, DosQPathInfo, and
- DosSetPathInfo. A program should call DosQSysInfo at run time to find the
- maximum possible path length for the system, and ensure that all buffers used
- by DosChDir, DosQCurDir, and related functions are sufficiently large.
- Similarly, the buffers used by DosOpen, DosMove, D o s G e t M o d N a m e ,
- DosFindFirst, DosFindNext, and like functions must allow for longer
- filenames. Any logic that folds cases in filenames or tests for the
- occurrence of only one dot delimiter in a filename must be rethought or
- eliminated
-
- The other changes in the API will not affect the average application. The
- functions DosQFileInfo, DosFindFirst, and DosFindNext now retum all three
- sets of times and dates (created, last accessed, last modified) for a file on
- an HPFS volume, but few programs are concerned with time and date stamps
- anyway. DosQFsInfo is used to obtain volume labels or disk characteristics
- just as before, and the use of DosSetFsInfo for volume labels is unchanged.
- There are a few totally new API functions such as DosFsCtl (analogous to
- DosDev IOCtl but used for communication between an application and an FSD),
- DosFsAttach (a sort of explicit mount call), and DosQFsAttach (determines
- which FSD owns a volume); these are intended mainly for use by disk utility
- programs.
-
- In order to prevent old OS/2 applications and MS-DOS applications running in
- the DOS box from inadvertently damaging HPDS files, a new flag bit has been
- defined in the EXE file header that indicates whether an application is
- HPFS-aware. If this bit is not set, the application will only be able to
- search for, open, or create files on HPFS volumes that are compatible with
- the FAT file system's 8.3 naming conventions. If the bit is set, OS/2 allows
- access to all files on an HPFS volume, because it assumes that the program
- knows how to handle long, free-form filenames and will take the
- responsibility of conserving a file's EAs and ACLs.
-
- Summary
-
- The HPFS solves all of the historical problems of the FAT file system .It
- achieves excellent throughput even in extreme cases-many very small files or
- a few very large files-by means of advanced data structures and techniques
- such as intelligent caching, read-ahead, and write-behind. Disk space is used
- economically because it is managed on a sector basis. Existing application
- programs will need modification to take advantage of the HPFS's support for
- extended attributes and long filenames, but these changes will not be
- difficult. All application programs will benefit from the HPFS's improved
- performance and decreased CPU use whether they are modified or not. This
- article is based on a prerelease version of the HPFS that was still
- undergoing modification and tuning. therefore, the final release of the HPFS
- may differ in some details from the description given here--Ed.
-
- B- Trees and B+ Trees
-
- Most programmers are at least passingly familiar with the data structure
- known as a binary tree. Binary trees are a technique for imposing a logical
- ordering on a collection of data items by means of pointers, without regard
- to the physical order of the data.
-
- In a simple binary tree, each node contains some data, including a key value
- that determines the node's logical position in the tree, as well as pointers
- to the node's left and right subtrees. The node that begins the tree is known
- as the root; the nodes that sit at the ends of the tree's branches are
- sometimes called the leaves.
-
- To find a particular piece of data, the binary tree is traversed from the
- root, At each node, the desired key is compared with the node's key; if they
- don't match, one branch of the node's subtree or another is selected based on
- whether the desired key is less than or greater than the node's key. This
- process continues until a match is found or an empty subtree is encountered
- (see Figure A).
-
- Such simple binary trees, although easy to understand and implement, have
- disadvantages in practice. If keys are not well distributed or are added to
- the tree in a non-random fashion, the tree can become quite asymmetric,
- leading to wide variations in tree traversal times.
-
- In order to make access times uniform, many programmers prefer a particular
- type of balanced tree known as a B-Tree. For the purposes of this
- discussion, the important points about a B-Tree are that data is stored in
- all nodes, more than one data item might be stored in a node, and all of the
- branches of the tree are of identical length (see Figure B).
-
- The worst-case behavior of a B-Tree is predictable and much better than that
- of a simple binary tree, but the maintenance of a B-Tree is correspondingly
- more complex. Adding a new data item, changing a key value, or deleting a
- data item may result in the splitting or merging of a node, which in tum
- forces a cascade of other operations on the tree to rebalance it.
-
- A B+ Tree is a specialized form of B-Tree that has two types of nodes:
- internal which only point to other nodes, and external, which contain the
- actual data (see Figure C).
-
- The advantage of a B+ Tree over a B- Tree is that the internal nodes of the
- B+ Tree can hold many more decision values than the intermediate-level nodes
- of a B-Tree, so the fan out of the tree is faster and the average length of a
- branch is shorter. This makes up for the fact that you must always follow a
- B+ Tree branch to its end to get the data for which you are looking, whereas
- in a B-Tree you may discover the data at an intermediate node or even at the
- root.
-
-
- --
- Robert Kelley Cook rcook@darwin.cc.nd.edu
- Univ. of Notre Dame
- Physics Dept. If you think these are ND opinions, speak to the pope!
-
-
-